443 research outputs found

    Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval

    Get PDF
    Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversarially. For this purpose, a modality classifier (as a discriminator) is built to distinguish the text and image modalities according to their different statistical properties. This discriminator uses its output probabilities to compute Shannon information entropy, which measures the uncertainty of the modality classification it performs. Moreover, feature encoders (as a generator) project uni-modal features into a commonly shared space and attempt to fool the discriminator by maximizing its output information entropy. Thus, maximizing information entropy gradually reduces the distribution discrepancy of cross-modal features, thereby achieving a domain confusion state where the discriminator cannot classify two modalities confidently. To reduce the semantic gap, Kullback-Leibler (KL) divergence and bi-directional triplet loss are used to associate the intra- and inter-modality similarity between features in the shared space. Furthermore, a regularization term based on KL-divergence with temperature scaling is used to calibrate the biased label classifier caused by the data imbalance issue. Extensive experiments with four deep models on four benchmarks are conducted to demonstrate the effectiveness of the proposed approach.Comment: Accepted by Pattern Recognitio

    Optimal multi-scale matching

    Get PDF
    Abstract The coarse-to-fine search strategy is extensively used in current reported research

    Lifelong Person Re-Identification via Adaptive Knowledge Accumulation

    Full text link
    Person ReID methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identification (LReID), which enables to learn continuously across multiple domains and even generalise on new and unseen domains. Following the cognitive processes in the human brain, we design an Adaptive Knowledge Accumulation (AKA) framework that is endowed with two crucial abilities: knowledge representation and knowledge operation. Our method alleviates catastrophic forgetting on seen domains and demonstrates the ability to generalize to unseen domains. Correspondingly, we also provide a new and large-scale benchmark for LReID. Extensive experiments demonstrate our method outperforms other competitors by a margin of 5.8% mAP in generalising evaluation.Comment: 10 pages, 5 figures, Accepted by CVPR202

    Dual Gaussian-based Variational Subspace Disentanglement for Visible-Infrared Person Re-Identification

    Get PDF
    Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems. Except for the intra-modality variance that RGB-RGB person re-identification mainly overcomes, VI-ReID suffers from additional inter-modality variance caused by the inherent heterogeneous gap. To solve the problem, we present a carefully designed dual Gaussian-based variational auto-encoder (DG-VAE), which disentangles an identity-discriminable and an identity-ambiguous cross-modality feature subspace, following a mixture-of-Gaussians (MoG) prior and a standard Gaussian distribution prior, respectively. Disentangling cross-modality identity-discriminable features leads to more robust retrieval for VI-ReID. To achieve efficient optimization like conventional VAE, we theoretically derive two variational inference terms for the MoG prior under the supervised setting, which not only restricts the identity-discriminable subspace so that the model explicitly handles the cross-modality intra-identity variance, but also enables the MoG distribution to avoid posterior collapse. Furthermore, we propose a triplet swap reconstruction (TSR) strategy to promote the above disentangling process. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on two VI-ReID datasets.Comment: Accepted by ACM MM 2020 poster. 12 pages, 10 appendixe

    Anesthesia clinical workload estimated from electronic health record documentation vs billed relative value units

    Get PDF
    IMPORTANCE: Accurate measurements of clinical workload are needed to inform health care policy. Existing methods for measuring clinical workload rely on surveys or time-motion studies, which are labor-intensive to collect and subject to biases. OBJECTIVE: To compare anesthesia clinical workload estimated from electronic health record (EHR) audit log data vs billed relative value units. DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study of anesthetic encounters occurring between August 26, 2019, and February 9, 2020, used data from 8 academic hospitals, community hospitals, and surgical centers across Missouri and Illinois. Clinicians who provided anesthetic services for at least 1 surgical encounter were included. Data were analyzed from January 2022 to January 2023. EXPOSURE: Anesthetic encounters associated with a surgical procedure were included. Encounters associated with labor analgesia and endoscopy were excluded. MAIN OUTCOMES AND MEASURES: For each encounter, EHR-derived clinical workload was estimated as the sum of all EHR actions recorded in the audit log by anesthesia clinicians who provided care. Billing-derived clinical workload was measured as the total number of units billed for the encounter. A linear mixed-effects model was used to estimate the relative contribution of patient complexity (American Society of Anesthesiology [ASA] physical status modifier), procedure complexity (ASA base unit value for the procedure), and anesthetic duration (time units) to EHR-derived and billing-derived workload. The resulting β coefficients were interpreted as the expected effect of a 1-unit change in each independent variable on the standardized workload outcome. The analysis plan was developed after the data were obtained. RESULTS: A total of 405 clinicians who provided anesthesia for 31 688 encounters were included in the study. A total of 8 288 132 audit log actions corresponding to 39 131 hours of EHR use were used to measure EHR-derived workload. The contributions of patient complexity, procedural complexity, and anesthesia duration to EHR-derived workload differed significantly from their contributions to billing-derived workload. The contribution of patient complexity toward EHR-derived workload (β = 0.162; 95% CI, 0.153-0.171) was more than 50% greater than its contribution toward billing-derived workload (β = 0.106; 95% CI, 0.097-0.116; P \u3c .001). In contrast, the contribution of procedure complexity toward EHR-derived workload (β = 0.033; 95% CI, 0.031-0.035) was approximately one-third its contribution toward billing-derived workload (β = 0.106; 95% CI, 0.104-0.108; P \u3c .001). CONCLUSIONS AND RELEVANCE: In this cross-sectional study of 8 hospitals, reimbursement for anesthesiology services overcompensated for procedural complexity and undercompensated for patient complexity. This method for measuring clinical workload could be used to improve reimbursement valuations for anesthesia and other specialties

    New trends and ideas in visual concept detection

    Full text link
    The MIR Flickr collection consists of 25000 high-quality photographic images of thousands of Flickr users, made available under the Creative Commons license. The database includes all the original user tags and EXIF metadata. Additionally, detailed and accurate annotations are provided for topics corresponding to the most prominent visual concepts in the user tag data. The rich metadata allow for a wide variety of image retrieval benchmarking scenarios. In this paper, we provide an overview of the various strategies that were devised for automatic visual concept detection using the MIR Flickr collection. In particular we discuss results from various experiments in combining social data and low-level content-based descriptors to improve the accuracy of visual concept classifiers. Additionally, we present retrieval result

    Deep Image Retrieval: A Survey

    Get PDF
    In recent years a vast amount of visual content has been generated and shared from various fields, such as social media platforms, medical images, and robotics. This abundance of content creation and sharing has introduced new challenges. In particular, searching databases for similar content, i.e.content based image retrieval (CBIR), is a long-established research area, and more efficient and accurate methods are needed for real time retrieval. Artificial intelligence has made progress in CBIR and has significantly facilitated the process of intelligent search. In this survey we organize and review recent CBIR works that are developed based on deep learning algorithms and techniques, including insights and techniques from recent papers. We identify and present the commonly-used benchmarks and evaluation methods used in the field. We collect common challenges and propose promising future directions. More specifically, we focus on image retrieval with deep learning and organize the state of the art methods according to the types of deep network structure, deep features, feature enhancement methods, and network fine-tuning strategies. Our survey considers a wide variety of recent methods, aiming to promote a global view of the field of instance-based CBIR.Comment: 20 pages, 11 figure
    • …
    corecore